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Abstract — We present an analysis of the Locally Competitive 
Algorithm (LCA), a Hopfleld-style neural network that efficiently 
solves sparse approximation problems (e.g., approximating a 
vector from a dictionary using just a few non-zero coefficients). 
This class of problems plays a significant role in both theories of 
neural coding and applications in signal processing. However, the 
LCA lacks analysis of its convergence properties and previous 
results on neural networks for nonsmooth optimization do not 
apply to the specifics of the LCA architecture. We show that 
the LCA has desirable convergence properties, such as stability 
and global convergence to the optimum of the objective function 
when it is unique. Under some mild conditions, the support 
of the solution is also proven to be reached in finite time. 
Furthermore, some restrictions on the problem specifics allow 
us to characterize the convergence rate of the system by showing 
that the LCA converges exponentially fast with an analytically 
bounded convergence rate. We support our analysis with several 
illustrative simulations. 

Index Terms — Locally Competitive Algorithm, sparse approx- 
imation, global stability, exponential convergence, nonsmooth 
objective, Lyapunov function. 



I. Introduction 

SPARSE approximation has generated substantial interest 
in a wide range of research communities over the last 
two decades, including signal processing, machine learning, 
statistics and computational neuroscience (e.g., see ||2|, p) and 
references therein). Specifically, sparse approximation involves 
solving an optimization problem to represent a signal using 
just a few atoms from some specified (possibly overcomplete) 
dictionary. In addition to describing compelling models of 
neural coding for sensory information |4|, |5|, this approach 
has led to state of the art results in many types of inverse 
problems. One example of a regime that has leveraged this type 
of signal model is the theory of Compressed Sensing (CS) |i^, 
||7) . In this domain, highly undersampled signals are recovered 
by solving a sparse approximation problem, thereby shifting 
the burden of data acquisition from the front-end sensor to a 
computationally intensive back end. 

Because of the increasing interest in the sparse approxi- 
mation paradigm, significant effort has been made to design 
efficient algorithms for solving (or approximately solving) 
least squares optimization problems that are regularized with a 
sparsity-inducing penalty (e.g., |8|-|10|). However, all of these 
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algorithms are developed to run on digital computers in dis- 
crete time, which are both implausible for neural systems and 
suffer from several drawbacks as engineering approaches. In 
particular, the computational time required by these algorithms 
presents a barrier to real-time signal processing applications 
with high dimensional signals at significant datarates. Specif- 
ically, digital algorithms tend to have storage requirements 
and convergence times that scale unfavorably with the dimen- 
sionality of the signals being approximated. Additionally, the 
power consumption of digital solutions can be prohibitive for 
many applications. Given these considerations, a fast and low- 
power method for solving sparse approximation problems in 
a parallel architecture would be valuable both for engineering 
systems and viable models of neural coding. 

Analog neural networks have long been proposed for solv- 
ing optimization problems |11|, with an early example being 
Hopfield's pioneering results | [T2| using networks to solve 
linear optimization problems. Analog neural networks offer 
several potential advantages over comparable digital algo- 
rithms, including their ability to be implemented in analog 
architectures that are highly parallel, fast and power efficient. 
Recent advances in VLSI reconfigurable analog chips |[T3| 
make the design of such systems more feasible and afford- 
able than has often been true in the past. A recent neural 
network architecture, called a Locally Competitive Algorithm 
(LCA) | [14] , has been proposed to solve the types of nons- 
mooth optimization problems that come up in sparse approx- 
imation. These Hopfield-style networks appear to efficiently 
solve a whole family of sparse approximation problems by 
incorporating ideal thresholding nonlinearities in the network 
dynamics. The results in |[T4| provide encouraging evidence 
that biological systems and engineering applications can use 
neural networks to solve these important sparse approximation 
problems. 

However, despite encouraging evidence of its performance, 
the LCA lacks strong convergence guarantees and estimates of 
the convergence rate. Furthermore, the specifics of the LCA 
architecture violate many of the assumptions used in previous 
analysis, making the extensive literature on neural network 
convergence inapplicable for this system. For example, in 
previous work the activation function is often assumed to be 
linear |15|, piecewise linear ||16|, or nonlinear but increas- 
ing and bounded |[T7|-|[20). In contrast, the LCA activation 
function is nonlinear and unbounded. Other analyses assume 



an interconnection matrix which is positive definite 1 16|, pT 



p2| or non-singular (23 1, p4) , while the LCA interconnection 
matrix may have negative eigenvalues as well as a non- 



trivial nullspace (due to the approximation dictionary being 
overcomplete). Other analyses consider a nonsmooth objective 
function with constraints, but only shows convergence when 
the constraints are nonzero and convex p5) , p6| . Finally, 
other relevant analyses assume the objective function is con- 
vex [21] or piecewise linear [281, while the work in |29| 
focuses on controllability of the network path. 

The main contributions of this paper are to present a formal 
analysis of the LCA network architecture and its convergence 
properties, despite 1) an activation function that is nonsmooth 
and not necessarily bounded or increasing, and 2) a potentially 
singular interconnection matrix. Section [III] contains our first 
main result, which states that the fixed points of the LCA 
network correspond to critical points of the objective function. 
In the special case where the objective is convex, this set 
coincides with the global minima of the objective. In addition, 
we show that the network is globally stable and that the outputs 
are quasi-convergent, in the sense that they get infinitely 
close to a set of fixed points. Finally, in the case where 
the objective function has isolated minima, we show that the 
LCA converges to the solution of the sparse approximation 
problem for any initial point (i.e., a much stronger condition 
than just the non-increasing property of the energy function 
that was shown in previous work [14]). This section also 
shows that the LCA is well-behaved in that it converges in 
a finite number of switches (i.e., nodes crossing above or 



below threshold). Section IV expands on these general results 
to show that, under additional mild conditions on the problem 
specifics, the LCA actually converges exponentially fast to the 
solution. Furthermore, we give an analytic expression for this 
convergence rate that depends on the properties of the detailed 
approximation problem. Finally, SectionMpresents simulation 
results showing the correspondence of our analytic results with 
empirical observations of the network behavior. 

II. Background 

Before presenting our main results, in this section we briefly 
give a precise statement of the sparse approximation problems 
of interest, a description of the LCA architecture, and some 
preliminary observations on the LCA network dynamics that 
will be useful in the subsequent analysis. 

A. Sparse Approximation 

As mentioned above, sparse approximation is an optimiza- 
tion program that seeks to find the approximation coefficients 
of a signal on a prescribed dictionary, using as few non-zero 
elements as possible. To fix notation, we denote the input 
signal by y G R^^ , the unit-norm dictionary elements by 
the columns of the M x TV matrix <1> = [$i, . . . , $Ar], and 
the coefficients by a G M.^ . Generally, M -^ N (i.e., the 
approximation dictionary is overcomplete), and the problem 
of recovering a from y is underdetermined. While similar 
in spirit to the well-known winner-take-all problem (30), 
sparse approximation problems are generally formulated as 
the solution to an optimization program because this approach 
can often yield strong performance guarantees in specific 
applications (e.g., recovery in a CS problem). In the most 



generic form, the objective function is the sum of a quadratic 
data fidelity term (i.e., mean squared error) and a regularization 
term that uses a sparsity-inducing cost function C(): 
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mm V (a(i)) = ^ ||y - Mil + A ^ CXa„), 



(1) 
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where the parameter A is a tradeoff between the two terms 
in the objective function. The "ideal" sparse approximation 
problem has a cost function that simply counts the number 
of non-zero elements, resulting in a non-convex objective 
function that has many local minima pTl. 

One of the most widely used programs from this family is 
known as Basis Pursuit De-Noising (BPDN) 1321, which is 
given by the objective function: 
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In BPDN, the ^i-norm (||a||j^ ~ J^n \'^n\) is used as a convex 
surrogate for the idealized counting norm. This program has 
gained in popularity as researchers have shown that, in many 
cases of interest, substituting the £i norm yields the same 
solution as using an idealized (and generally intractable) 
counting norm |33|. However, BPDN illustrates the canonical 
challenge of sparse approximation problems. Despite being 
convex, the BPDN objective function contains a nonsmooth 
nonlinearity that makes it considerably more difficult than a 
classic least-squares problem. 

In the context of computational neuroscience, sparse ap- 
proximation has been proposed as a neural coding scheme 
for sensory information. In one interpretation, programs such 
as BPDN can be viewed as Bayesian inference in a linear 
generative model with Gaussian noise and a prior with high 
kurtosis to encourage sparsity (e.g., the Laplacian prior in 
the case of BPDN) Q. Given the prevalence of probabilistic 
inference as a successful description of human perception | |34[ 
and the theoretical benefits of sparse representations |J3j7it 
has long been conjectured that sensory systems may encode 
stimuli via sparse approximation. In fact, in classic results, 
sparse approximation applied to the statistics of natural stimuli 
in an unsupervised learning experiment has been shown to 
be sufficient to qualitatively and quantitatively explain the 
receptive field properties of simple cells in the primary visual 
cortex pi, p5] as well as the auditory nerve fibers |36|. Only 
recently have there been proposals of efficient neural networks 
that could efficiently solve the necessary optimization prob- 
lems to implement this type of encoding fl?), p5|, |37|, p8). 



B. LCA structure 

Our primary interest will be the LCA p4| , an analog, 
continuous-time dynamical system that is a type of Hopfield- 
style network. In particular, each node in the LCA network 
is characterized by the evolution of a set of internal state 
variables, u„(t) for n = 1,...,N, and uses a nonlinear 
activation function T\{-) to produce output variables a„(t) 
for n = 1, . . . ,N. The activation function is typically a nons- 
mooth nonlinear function (such as a thresholding function) to 
induce sparsity in the outputs. The dynamics of the internal 
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Fig. 1. Block diagram of the Locally Competitive Algorithm (LCA) 1 14|, 
a Hopfield-style network architecture for solving sparse approximation prob- 
lems. 




Fig. 2. The soft-thresholding activation function. When this is used for the 
thresholding function Tx{-), the LCA solves the popular BPDN optimization 
problem used in many sparse approximation applications. 



State variables are governed by a set of coupled, nonlinear 
ordinary differential equations (ODEs): 

Tu{t) = -u{t) - ($^$ - /) a{t) + 
a{t)^Tx{u{t)) 



$^y 



(3) 



In this network, each node is associated with a single dictio- 
nary element $„, n = 1, . . . ,N, and the node outputs will 
be shown to be the solution to the optimization problem of 
interest. The architecture of a typical LCA is shown in Fig. [T] 

The inputs to the LCA network are feedforward connections 
computing the vector of driving synaptic inputs $^y, reflect- 
ing how well the signal y matches each dictionary element. 
The network also has recurrent inhibitory or excitatory connec- 
tions between the nodes, modulated by weights corresponding 
to the interconnection matrix G = $^$ — I (i.e., a modified 
Grammian matrix for the dictionary). The more overlap there 
is between a pair of nodes (characterized by the inner product 
between their dictionary elements), the stronger the potential 
inhibition between those nodes. While the modulating weight 
is symmetric between any pair of nodes, the total inhibition 
is not because it is also modulated by the activity of each 
individual node. Moreover, the matrix G potentially has both 
negative and positive eigenvalues, as well as a non-trivial 
nullspace. This inhibition structure ensures that nodes that 
carry the same information will not become active at the same 
time, thus meeting the goals of sparse approximation. The 
time constant r is dependent on the physical properties of 
the solver implementing the ODEs. For our purposes, we will 
often assume r = 1 without loss of generality. 

It was shown in [fT4T| that the objective function in ([TJ is 
non-increasing along the LCA trajectory with the following 
relationship (Va„ £ M such that a„ 7^ 0) between the cost 
penalty term C(-) and the activation function T\{-): 

dC{an) 
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Un -Tx(Un). 



(4) 



In the case of BPDN, the cost penalty is simply C{x) — \x\ 
and the activation function obtained from (|4]) is the soft- 
thresholding function (Fig. |2]i defined by; 

0, \Un{t)\<\ 

Unit) - A sign(u„(t)), \Un{t)\ > X 



ln[t) = Tx{Un{t)) = 



This function often arises in connection to algorithms for min- 
imizing the absolute value of the coefficients (e.g., see |39|). 
Generalizing from the soft-threshold, we focus on "threshold- 
ing" activation functions T\{-) of the form: 

0, \Un{t)\ < X 

/K(t)), \u„{t)\ > X ' 



a„(t) ^Txiun{t)) 



(5) 



where the function /(•) is a real-valued function defined and 
continuous on the domain V = (—00, —A] U [A, +00), differ- 
entiable on the interior of V, and satisfying the conditions: 



f{-Un) = -f{Un), 
/'(««) > 0, 
f{Un) < Un, 



\/un e V and /(A) = (6a) 
Vw„ e V (6b) 

Vm„ e T) S.t. Un > 0. (6c) 



Remark. Eq. ( |6a| l ensures that Tx{-) is continuous on all 
M and consequently, that u{t) and a{t) are continuous with 
respect to time. This ensures that the cost function C(-) is 
differentiable everywhere except at the origin. Eq. ( |6b] i makes 
/(•) a bijection on V. Finally, Eq. ( |6a] i and ( |6c] i ensure, by a 
simple computation, that C(-) is positive and non-decreasing 
with the absolute value of the coefficient. Notice that the £1- 
minimization objective function satisfies the three criteria in 
(|6]l, in addition to being convex (which is not required, but 
guarantees that the system will find a global minima). In 
Fig. I3] we plot a more generic stylized activation function 
that satisfies the conditions (|6]). 

As shown in Fig. |3] the activation function (J5| is composed 
of two operating regions. The first region corresponds to the 
case where the internal state u„ is below the threshold A, in 
which case the output a„ is zero. We call the nodes in this 
region inactive nodes. The second region corresponds to the 
case where the internal state w„ is above threshold, in which 
case ( |6b| l guarantees that the output a„ is strictly increasing 
with Un- We call the nodes that are above threshold active 
nodes, and we denote by F the active set (i.e., the set of 
indices corresponding to those nodes, F = {n E [i-,N] : 
\un{t)\ > A}). On the contrary, we denote the set of indices 
corresponding to nodes that are below threshold by F'^ and 
call them the inactive set. While these two sets do change 
■ with time as the network evolves, for the sake of readability 




Fig. 3. Generic activation function satisfying conditions J6). The area in 
gray represents wliere tlie activation function must lie in to satisfy conditions 
l6bland J631. 



we omit the dependence on time in the notation. 

Due to the nonsmooth nature of the objective ([T]), we require 
the generalized notion of a subgradient (see | |40| for more 
details). Note that V (a) is differentiable everywhere except at 
points a that contain zeros, due to the non-differentiability of 
C(-) at the origin noted above for the cases of interest in sparse 
approximation. The subgradient dV{-) extends the classic 
notion of a gradient VF() at those points of discontinuity. 
The properties in (|6| allow us to use the following simple 
definition: 



dVia) 



I < lim W{ai) : a.i —> a,ai ^ S, at ^ Vly \ 



where co is the convex hull, Oy is the set of point where 
V fails to be differentiable and S is any set of Lebesgue 
measure in M^. In other words, it is the smallest convex 
set containing the gradients of the function as we approach 
the discontinuity from any direction. Note that when a is not 
a point of discontinuity, this simply reduces to W{a). 

C. LCA Node Dynamics 

Because nodes can cross threshold and go from the inactive 
set to the active set (and vice versa), the LCA can be thought 
of as a switched system Hi] where the ODEs change at each 
switching time (i.e., when a node crosses above or below 
threshold). In between two switching times, the active set F 
and inactive set V^ are fixed. Therefore, the dynamics on the 
two sets can be considered separately to facilitate analysis. 
In the following, $7- is the matrix composed of the columns 
of $ indexed by the set T. Similarly, uj- and aj- refer to the 
elements in the original vectors indexed by T- We also denote 
by {tfclf/cGN} the sequence of times for which the system 
switches from the set of active nodes Tk-i to F^. The notation 
Tx{u{t)) refers to the vector [Tx{ui{t)), . . . ,Tx{uN{t))f , 
and we denote by F'it) the associated Jacobian matrix with 
respect to the internal state variables u. Because the activation 
function has the form in (|5]l, the matrix F' [t) is diagonal with 
diagonal elements equal to zero for indices in the inactive set 
F^ and equal to f'{un) for indices n in the active set F. 

Using the chain rule and (|5]l, we can calculate the derivative 



with respect to time of the outputs on the active set F: 

dr(t) - F{.{t)ur{t) = diag(/' K(i))) wrW- (7) 

As a consequence, the ODE (|3]l can be rewritten for nodes in 
the active set as follows: 

ar{t) = F^{t) [-ur{t) + ar{t) - $f $rar(t) + <^fy] ■ (8) 

The active nodes follow this ODE until the next switch occurs, 
changing the sets of active and inactive nodes. Similarly, we 
can rewrite the set of differential equations acting on the 
inactive nodes. Since the output of the activation function for 
nodes in F^ is zero, the ODE Q on the inactive set becomes: 



iir^it) = -wrc(t) - ^fc^rarit) + ^r^U- 



(9) 



Separating the system dynamics into two separate character- 
izations for the active nodes F and inactive nodes F'^ yields two 
sets of differential equations that are partially decoupled, and 
this will be crucial for our subsequent analysis. This partial 
decoupling is possible because only the active nodes in the 
system produce inhibitory signals, coupling their dynamics 
to the dynamics of each node in the inactive set via the 
interconnection matrix. However, because inactive nodes do 
not inhibit active nodes, the dynamics on the active set are 
independent of the inactive set. 



III. Convergence Results 

In fT4^, the authors show that the LCA network trajectory 
is non-increasing on the energy surface corresponding to the 
desired objective function. While this is necessary behavior 
for a network that solves an optimization problem, it is not 
sufficient to actually show that the network converges to a 
fixed point (or set of points), let alone reaches a minimum 
of that desired objective function. Both of these are impor- 
tant guarantees to make before relying on this system as a 
viable model of neural processing or as an implementation in 
engineering applications. 



In Section III-A we analyze the general convergence prop- 



erties of the LCA. In particular, in this section the intercon- 
nection matrix <I>^$ — / may have many negative or zero 
eigenvalues, which complicates the analysis. The main results 
of this first section show that the fixed points of the LCA 
correspond to critical points of the objective ([T]) and the 
outputs of the network converge from any initial state to the 
set of fixed points. In the case of isolated critical points of ([T]i, 
we show that the LCA is globally convergent in the sense that 
starting from any initial state, the system converges to a fixed 



point. Elaborating on these results. Section III-B shows that 



the LCA (even under very general assumptions) is very well- 
behaved, converging in a finite number of switches (i.e., nodes 
crossing above or below threshold) and therefore recovering 
the solution support (i.e., the set of non-zero elements) in finite 
time. Due to the difficulties described above, our approach 
relies on classic results from nonsmooth analysis, such as 
subgradients and a generalized chain rule (see Appendix [A|. 



A. Global Asymptotic Convergence 



To begin, we recall some useful definitions of stability 
and concepts from Lyapunov theory (see f42] for more back- 
ground). There exist several notions of stability associated with 
dynamical systems that describe the evolution of the nodes or 
their outputs both locally and globally. For a neural network 
described by a differential equation of the form 



X — G{x), X G 



l,N 



(10) 



with outputs z = T\{x) defined as in (|5]l, we say that a 
constant vector x* e M.^ is a fixed point of ( fTO] ) if and only 
if 

G{x*)^0. (11) 

On the other hand, the outputs of a system reach a critical 
point a'^ of the objective function V{-) when they satisfy the 
inclusion: 

e dV{a'=). (12) 

Note that if y(-) is convex, then the critical points correspond 
exactly to the global minima of V. We say that the system 
(fTO| is (Lyapunov) stable at x* if for each e > 0, there exists 
R > such that for all xq with ||xo — x*|| < R, and all the 
solutions x{-) with initial state xq, 



\\xit) 



< e, Vi > 0. 



(13) 



This is clearly a local notion of stability, guaranteeing that 
once the trajectory is close to a fixed point, it remains nearby. 
But this type of stability is insufficient to guarantee global 
convergence, which means that trajectories approach a fixed 
point as time goes to infinity. The following notion of stability 
is slightly stronger, guaranteeing that trajectories approach at 
least a set of fixed points regardless of the initial state. 

Definition 1. We say that the outputs of ( [TO] i are globally 

quasi-convergent if there exists a set £ = {z* e M.^ : z* = 
0} such that for all xq G M.^, the outputs z(-) = T\ {x{)) 
with initial state xq satisfy lim z{t) G £. 

Finally, the strongest and most desirable form of conver- 
gence, which guarantees that the nodes converge to a single 
fixed point, is stated in the next definition. 



Definition 2. We say that ( [Jop is globally convergent, or 
equivalently globally asymptotically stable, if there exists a 
fixed point x* at which the system is stable, and if for all initial 
states Xq G M^ the solutions x{-) satisfy: lim x{t) — x* . 



Typically, global convergence is established through the use 
of a so-called Lyapunov function V . The notation V refers to 
the derivative with respect to time, i.e. dV/dt. 



Definition 3. A function V : 
function on M.^ if: 

(i) V{x) > 0, Va; ^ 0; 
(a) V is continuous on 
[Hi) V{x) < 0, Vx G K^; and 
(iv) V is radially unbounded: 



N 



H^ 



is a weak Lyapunov 



fN. 



lim V{x) = +0O. 



Similarly, a function is called a strict Lyapunov function if 

it meets the above conditions, with the exception of having a 
strict inequality in condition (Hi) (i.e., V{x) < 0,Va; G M^). 

Remark. When it is possible to find a weak Lyapunov function 
for a given dynamical system, the first theorem of Lyapunov 
|42| guarantees that any fixed point of the system is stable 
in the sense of ( [T3] l. However, to show global convergence of 
a system, the second theorem of Lyapunov requires a strict 
Lyapunov function. 

One can check that the objective function in ([TJ is a weak 
Lyapunov function for the system (|3]l, thus guaranteeing that 
the LCA is Lyapunov stable (which improves on the stability 
result obtained in |14|). However, it is not a strict Lyapunov 
function. Indeed, ([T]) only depends on the active nodes, mean- 
ing that it could stop decreasing while subthreshold nodes are 
still evolving. Continued evolution of the inactive nodes could 
cause a node in F^ to become active, thereby causing the 
objective function to start decreasing again. As a consequence, 
condition (Hi) in Definition [3] is not satisfied with strict 
inequality, and the standard approach of using the objective 
function as a Lyapunov function is not sufficient to show 
global convergence of the LCA. To show global convergence 
of the system to a fixed point, it is necessary to account for 
the dynamics on both active and inactive nodes. 

Our main convergence theorem guarantees the global quasi- 
convergence of the LCA towards a set of critical points of the 
objective function. In the case of isolated critical points, this 
implies global convergence. 

Theorem 1. The LCA system defined in (pi), with an activation 
function of the form (pi) satisfying conditions ^: 

1) has fixed points that are critical points of the objective 

function defined in (ITJ; 
2} has globally quasi-convergent outputs; and 
3) is globally convergent, provided that the critical points 

of ([T]l are isolated. 

Note that part 2 of the above theorem only relates to the 
output variables, while part 3 is a much stronger condition on 
the entire dynamical system (including subthreshold states). 
In the highly relevant case of convex objective functions with 
unique minima (e.g., BPDN), part 3 of the above theorem 
applies directly and we have the strongest possible notion of 
convergence. In fact, for most dictionaries $ (e.g., random 
Gaussian), the minimum of the £i -minimization objective 
function (|2]i is unique |43J. Recent results on subgradient 
dynamical systems |[26|, ||44) lead us to believe that part 3 
could still hold in the case where the fixed points of the 
system are not isolated and there exists a subspace of solutions 
to ([TJ, but this conjecture is beyond the scope of this work. 
Section |V-A| illustrates the convergence behavior of the LCA 
in simulation. The proof of this theorem is in Appendix IA] 
and relies on generalized notions of a subgradient due to the 
nonsmooth nature of the objective. 



B. Convergence in a Finite Number of Switches 

In the theorem below, we strengthen the previous resuh 
to estabUsh that under some mild conditions, the support 
of the final solution is reached with a finite number of 
switches. To prove this result, it is sufficient to assume that 
no node in the solution lies exactly on the threshold A. This 
assumption precludes unwanted infinite oscillation behavior on 
the boundaries, known as Zeno behavior. In other words, we 
assume that there exists a margin (r > 0) above and below 
the threshold which contains no node in u* . One would expect 
this condition to hold with near certainty for any signal that 
was not pathologically constructed. 

Theorem 2. // the system (|3]l converges to a fixed point u* 
such that there exists r > 0." 

Vner,, |<|>A + r 
Vner^, |<| < A-r ' 

then the system converges after a finite number of switches. 
This implies that the neural network recovers the support of 



the solution a* in finite time. Section V-A explores the number 
of switches during the convergence of the LCA in simulation. 

Proof: Let F* be the set of active nodes in u* . By 
contradiction, assume that the sequence of switching times 
{^fc}feeN i^ infinite. Since the LCA converges to u* , we have: 

u{tk) — > u* . 

As a consequence, for r > 0, there exists ii' € N such that 
V/c > K, \\u{tk) — u*||2 < r. To begin, we show that for all 
k > K, the state variables u{tk) are in the subsystem F*. For 
all k > K, we have two cases: 

• Nodes that are above threshold in u* are above threshold 
in u{tk)- Indeed, Vn G F*, we have: 

r > \un{tk) - <| > Kl - \unitk)\ > A + r - |u„(ife)| 

=> \Un{tk)\ > A. 

Moreover, nodes are active with the correct sign, other- 
wise, we would have: 

r > \un{tk) - <| = \un{tk)\ + Kl > A + A + r 
=^ 0> A, 

which is a contradiction. 

• Nodes that are below threshold in u* are below threshold 
in u{tk)- Indeed, Vn e F^, we have: 

\un{tk)\-X < |u„(ife)|-K|-r < \un{tk) - ul\-r < 

^ \Un{tk)\ < A. 

As a consequence, for all k > K, F^ = F*. However, Ffc 
and Tk+i must be different to define the switching time tk+i 
and we reach a contradiction. This proves that after a finite 
number of switches K, there cannot be any switching out of 
subsystem F*. ■ 



IV. Exponential Convergence Rate 

With Theorem [T] showing that the LCA is globally conver- 
gent, the most pressing issue remaining is to determine the 
convergence rate of the network. Such a bound will be espe- 
cially important for implementations, which must guarantee 
solution times. In this section, we show that under some ad- 
ditional conditions on the problem's specifics (given in ([T6|), 
the LCA network converges exponentially fast to a unique 
fixed point u*. We also give an analytic characterization of 
the convergence speedPl 

To begin this section, we recall the definition of exponential 
convergence: 

Definition 4. The dynamical system in ( fTO] ) is exponentially 

convergent to the solution x* if there exists a constant c > 
such that for any initial point x(0), there exists a constant 
Kq > (which may depend on x{0)) for which the trajectory 
x{t) of the system satisfies: 



\\x{t) — a;*|| < kqc 



Vt>0. 



The constant c is referred to as convergence speed of the 

system. 

In order to state the main theorem of this section, we define 
the two following quantities. The first constant is denoted by 
a and provides a bound on the derivative of /(•) in (|5]l: 

Vi>0,Vn=l,...,iV \f'{unit))\<a. (14) 

Note that the constant a is always well defined since the trajec- 
tories Un{t) are bounded and the function /(•) is continuous. 
The second constant, denoted by 5, is the smallest positive 
constant such that for any active set F visited by the algorithm 
and any vector x in K^ with active set F = F U F*, where 
F* is the active set of the solution to ([T]i, we have: 



(l-<5)||x||^<||<J,x||^<(l + 5)||x||^ 



(15) 



The constant S depends on the singular values of the matrix 
$P and on the sequence of active sets visited by the system. 
It may not be well defined for any matrix $ or any input y. 
However, in many interesting cases in CS, the constant S is 
close to and the dictionary elements are almost orthogonal 
for any small enough active set |45|. The following theorem 
shows that this constant directly relates to the convergence 
speed of the neural network. 

Theorem 3. Under conditions (|6]l on the activation function 
in (|5]l, and provided that the constants a and S, defined in 
( |14[ ) and ( |15| l respectively, satisfy: 



6 < min 1, ^ 

V (2a- if. 



(16) 



the LCA system defined in (|3]l is globally exponentially 
convergent to a unique equilibrium, with convergence speed 

{l-S)/2r. 

If condition ([T6| is satisfied, the expression given for the 

'We reintroduce the time constant r in this discussion of the LCA, since 
it appears in the expression for the convergence speed. 



convergence speed is positive and thus meaningful. It depends 
on the eigenvalues of the matrix $p ^r, which vary with the 
active set T. A careful analysis of the sequence of active 
sets visited by the network is required to obtain a good 
estimate of S. Since such a study is application dependent, 
we do not address this question here. Note that in the very 
interesting case of the soft-threshold function, a — 1 and so, 
condition ^T6\ reduces to 5 < 1. The time constant r of the 
physical solver implementing the LCA neural network appears 
in the expression of the speed of convergence. Lowering this 
time constant means the system will converge faster. Analog 
systems can have smaller time constants than their digital 
counterparts that scale better with the problem size. Section 



V-B explores the convergence rate bounds for the LCA in 
simulation. 

To establish the expression of the convergence speed, we 
use the following Lure-Postnikov-type Lyapunov function: 



1 



N 



n=\ 



S„(t) 



i^W = 2 II^WIl2 + 1^ / 9Mds. (17) 



where we again redefine the output and state variables in terms 
of the distance from any arbitrary fixed point u* 



Un{t) = Un{t) 

a«(i) = an{t) - 



< == T\{Un{t) + <) - rA«), 



(18) 



and the function 5„(s) = Tx{s + m*) — r\(u*), which 
captures how much a perturbation of the internal state affects 
the output. Because the second term in ( [T7] i is non-negative, 
we have trivially that this energy function bounds the mean- 
squared error of the current state values to their steady-state 

values, - ||u(t)||2 < E{t). The theorem will prove that u* is 
unique. The properties presented in the following Lemma (see 
Appendix IS] for proof) are useful to prove the main result of 
this section. 

Lemma 1. Assume that the activation function (|5]) satisfies 
the conditions (|6]). Then, the set of variables u and a defined 
in ( |18[ ) satisfy the properties: 

(i) sign(a„)= sign(u„). 
(ii) \an\ < a\un\ ■ 
(Hi) Oj-aj- < auj-aj- < a^Uj-uj- for any T 

{in particular for T = V). 

f-S„(t) 



{iv) 



N 



gn{s)ds < u a. 



Armed with this lemma, we can now give the proof of the 
main theorem. 

Proof of Theorem pi- We begin by noting that since u* 
and a* are the internal state variables and the outputs at a 
fixed point of (|3]l, we can rewrite the dynamics in terms of 
the new variables: 



ru(i) = ~u{t) - ($^$ - /) a{t). 



(19) 



Using the chain rule, we find the expression for the deriva- 
tive with respect to time of the energy function ( [T7| ): 

rEit) ^ /^^W) "^^ . (u(t) + a{t)YrW)t. 



du{t) 



dt 



Using ([T9|, this becomes: 

TE{t) = - {u{t) + S(t)) {u{t) + ($^$ - /) S(i)) . 

Using the properties in Lemma [T] leads to the desired expres- 
sion (and removing the term t to increase readability): 



TE{t) = -\u + a 



$'$-/ S 



= (u + of (-M + a - $^$a) 



-u^u + a^a ~ a'^^'^^a - u^^'^^a. 



Adding and subtracting the term -u$^$m, results in: 

TE{t) = -u'^u + a^a - a^^'^^a + - {u - of $^$ (u - a) 

1_ ,^__, rp ^__^ ±_ rj-\ rp 

2 2 

= -TFu + Z^a-^- W^aWl + ^ ||$ {u-a)\\l - ^ \\^u\\l . 

We separate the vectors into their components onto the active 
set r = r U r* of a and its complement F^: 

1 
2 



* (« - a)\\l - o Il*"ll2 = o ll^r ("r " «?) II2 + o ll*r="rc|| J 



2 " "^ 2 II ^^ ' ^^112 2 

1 ll^~l|2 1 11^ /- ~M|2 1 11^ ~ ||2 

- 2 Il*"ll2 = 2 Pr ("r - «)ll2 - 2 ll^r^rL ■ 
We can now use the definition of 5 in ([TSj: 

TE[i) < -u^u + a^a- -{1-S)\\a\\l 



1 ||_ _||2 1 ,,_ ||2 

+ 2^1 + '^) Fr^'^L - 2*'^"'^-' ll^rL 



(1 - S)u'^u - (1 - S)u'^a - ^ (1 - S)u'^u^ 



26u~a + 2Sa^a - ^{l + S) ||up 



2 



< --{I - 6)u^u - {1 - S)u'^a - -(l-^)S^Uf 



r"r 



25u~a + 2Sa^a. 



Now, using property (iii) of LemmafT] the expression becomes: 



rEit)<-lil-5)\\u\\l-il-S)u''a 

-^(l-S)-26 + 2Sa 
2a 

-lil-5)\\u\\l-il-S)^a 



u~a 



1 
2^ 



u~a. 



l-5{2a-lY 
From ( |T6] l, we know that 1 — 5 {2a — 1) > 0, and so: 



1 



TE{t)<--{l-S)\\ur^-il-d)u'a. 



Finally, using property (iv) of LemmafTland the fact that 6 < 1, 
we arrive at: 



1 



N 



rE{t)<--{l-6)\\u\\l-{l-S)J2 



_i ^0 



n=l 



At) 



gnis)ds 



= -{l-S)E{t). 
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Fig. 4. Plot of the evolution with respect to time of several LCA nodes 
Ukit). The plain lines con'espond to nodes that are active in the final solution 
and the dashed lines correspond to nodes that are inactive in the final solution. 



Knowing already that E{t) is an upper bound on the quantity 
of interest, we can integrate both sides of the above inequahty 
(i.e., apply Gronwall's inequality) to get a bound on E{t), 

\ \\u[t) -u*f^< E{t) < e-(i-^)*/-i?(0), (20) 

proving that the LCA is exponentially convergent to a unique 
fixed point u* . Taking the square root, this expression also 
shows that the convergence speed in the exponential is 

(1 — S) /(2r), as stated in the theorem. 



V. Simulations 

In this section, we illustrate the previous theoretical results 
in simulation. Each plot is based on the following canonical 
sparse approximation problem. We generate a "true" sparse 
vector ao G K^, with A^ = 512 and s = 5 non-zero entries. 
We select the locations of the nonzeros uniformly at random 
and draw their amplitudes from a normal gaussian distribution 
(then normalizing to have unit norm). We choose a dictionary 
$ as a union of the canonical basis and a sinusoidal basis 
having dimensions M x N with M = 256. The vector of 
measurements is y — $ao + i] E R^^ , where r/ is Gaussian 
random noise with standard deviation 0.0062. We use an LCA 
with a soft-threshold activation function, with a threshold set 
to A = 0.025 and u(0) = 0. We simulate the LCA dynamical 
equations (|3]l through a discrete approximation in Matlab with 
a step size of 0.001 and a solver time constant chosen to be 
equal to t = 0.01. 



A. Convergence Results 

From Theorem [T] the LCA should converge and recover 
the solution to the sparse approximation problem Q, which 
has a unique minimizer Since the signal oq to recover is 
sparse, the outputs of the neural network are expected to 
converge to a solution close to the initial signal ap. Fig. HI 




100 200 300 400 500 
node number 

Fig. 5. Output a* of the LCA after convergence. Only non-zero elements 
are plotted. The fixed point reached by the system is very close to the initial 
sparse vector used to create the measurement vector (it cannot be exact due 
to noise). The solution is also very close to a standard digital solver 1 8 1 nin 
using the same inputs. Note that the LCA produces many coefficients that are 
exactly zero (therefore not plotted). 



shows the evolution of a few nodes w„(i) selected at random. 
We see that both active and inactive nodes converge relatively 
quickly. Fig. l5] shows the fixed point reached by the LCA 
system and compares it to the initial signal ao and to the 
solution obtained using a standard digital solver for the same 
optimization program 18) . The solution reached by the network 
possesses exactly 5 non-zero entries that correspond to the 
non-zero entries in oq. The recovered amplitudes are very 
close to the initial amplitudes (it cannot be exact due to the 
added measurement noise) and to the ones produced by the 
reference digital solver. However, the LCA produces a sparse 
vector while the digital solver returns many small but non-zero 
entries that would have to be removed by postprocessing. 

To illustrate the global convergence behavior, we also ran 
the LCA for 30 randomly generated initial points. We selected 
two nodes from the final active set and plotted the trajectories 
in the space defined by those two nodes. Fig. l6] clearly shows 
that the solution is attractive for any of those initial points. 

To illustrate the number of switches used by the system 
(see Theorem l2]l, we generate 1000 sparse vectors oq and 
measurements y and simulate the LCA dynamics. Fig.lTlshows 
a histogram of the number of switches needed for the system 
to converge. The figure illustrates that the number of switches 
before convergence of the neural network is finite and of the 
order of the dictionary size. This illustrates that this solver 
takes an efficient path towards the solution. 

B. Convergence Rate Results 

To illustrate the convergence rate result in Theorem [3] it 
is necessary to find an expression for the convergence speed 
(1 — 6)/t that appears in the exponential term in ( |20l l. This 
term bounds the error squared: 

\\\u{t)-u*\\l, 




10 



0.2 0.3 0.4 0.5 0.6 0.7 

Fig. 6. Trajectories of u(t) in the plane defined by two nodes cliosen 
randomly from the active set. Trajectories are shown for 30 random initial 
states. 
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Fig. 7. Histogram (in percentage) of the number of switches the LCA requires 
before convergence over 1000 trials. 



which is normahzed to have initial value at t = of 1 and plot- 
ted using a log-scale on Fig. [8] Note that the constant S defined 
in ([TSj depends on the sequence of active sets Ffe visited by the 
system. However, it is very difficult to predict for a given input 
signal what sequence of active sets the algorithm is going to 
visit. To estimate this upper bound, we compute the constant 
5 using the matrix ^r, composed of the dictionary elements 
that are active in the final solution. The corresponding upper 
bound on the decay e"*^^""^ '*/^ is plotted in Fig.jsj During the 
transient phase, the number of active nodes is actually larger 
and thus, we expect 1 — i5 to be smaller than this estimate 
and the nodes to converge slower at some point during the 
transient phase. As a consequence, we keep track of the largest 
support visited by the network and compute the corresponding 
S. This second upper bound e^'^^*""")*/^ on the convergence 
rate is plotted on the same figure. As expected, the theoretical 
decay computed with the maximum support visited is an upper 
bound for the convergence speed. However, it can be seen that 
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Fig. 8. Convergence behavior of = ||"(i) ^ 'i*ll2- The dashed line .shows 
the theoretical decay in |20} with o* computed by using the final solution 
support. The dash-dot line shows the theoretical decay with (5jnax computed 
on the largest support visited. While both estimates are showing theoretically 
correct behavior, the estimate of the rate based on 5* is more empirically 
accurate than the conservative estimate based on (5max. 



this estimate is very pessimistic and that the bound computed 
with the final active set is a better estimate. This simulation 
illustrates that the theoretical exponential convergence appears 
to capture the essential system behavior. 

VI. Conclusion 

This paper presents a mathematical analysis of the conver- 
gence properties and convergence rate of the LCA, a neural 
network designed specifically for the challenging sparse ap- 
proximation problem. Despite a nonsmooth activation function 
and possibly singular interconnection matrix that prevent the 
application of existing analysis approaches, we have shown 
that the system is globally convergent to the optimal solution. 
In addition, under some mild assumptions on the solution, we 
have shown that the trajectories follow a reasonable path and 
reach the final active set in finite time. Finally, under slightly 
stronger assumptions on the problem specifics (applicable at 
least in CS recovery problems), we established that the LCA 
is exponentially convergent and with a convergence rate that 
depends on problem-specific parameters. 

This collection of results and analysis leads us to conclude 
that performance guarantees can be made for the LCA system 
that make it plausible for implementation in engineering 
applications and as a model of biological information pro- 
cessing. Indeed, providing such guarantees makes it easier to 
justify the expense associated with developing analog VLSI 
implementations, which could eventually result in significant 
improvements to the speed and power consumption necessary 
for real-time signal processing applications. Our future work 
will concentrate on finding reasonable estimates of the theo- 
retical convergence speed (especially in well-studied special 
cases such as CS recovery), and further characterizations of 
the LCA system dynamics that may open up new applications 
of this system for time-varying input signals. 
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Appendix A 
Proof of Theorem[T] 

Building on the earlier definition of a subgradient to give 
a notion that non-differentiable functions can still be well- 
behaved, a function g : X ^ M. (where X is a Banach space) 
is said to be regular pO) Def. 2.3.4] at x in X if 

(i) For all v G X, the usual one-sided directional derivative 

g{x + tv) - g{x) 



q'(x:v) = lim ■ 



t 



exists. 



(ii) For all v G X, g'{x; v) — g°{x; v), where g°{x; v) is the 
generalized directional derivative 

g{y + tv) ~ g{y) 



g°{x]v) — lim sup ■ 
40 



t 



It can easily been seen that, since the function C() is defined 
from K to M, is differentiable on M\{0} and the function 
Tx{-) is continuous on all M, then C(-) admits left and right 
derivatives and is clearly regular on M. This implies that V{-) 
is regular on M^, and by |40, Prop. 2.3.3] we have that: 



dV{a{t)) = -$^y + $^$a(i) + XdC{a{t)), 



(21) 



where dC{a{t)) = [aC(ai(t)), . . . , aC(aAr(t))] We also 
recall the following result pO) Th. 2.3.9 (iii)], which is a 
generalization of the chain rule for regular functions. 

Lemma 2. Suppose that V{a) : R^ -^ R is regular in M^ 
and that a{t) : [0, +00) — > M^ is strictly differentiable on 
[0, +00). Then, V{a{t)) is regular on M , and we have 

d_ 
dt 



-V{a{t))=(^a{t) VCeay(a(i)). (22) 



Note that from this theorem, since V{a{t)) is regular, 
we can choose any element in dV{a{t)) to compute the 
time derivative of V{-) along the trajectories of the neural 
network. Armed with these tools, we proceed with the proof 
of Theorem [T] 

Proof of Theorem U\ Beginning with part 1 of the 
theorem, we first show that any fixed point of system (|3]l is 
a critical point of the objective in (fill. From ( fTT) , any fixed 
point u* of ([3]) satisfies the relationship u{t) ~ 0. Let F* be 
the active set at the fixed point u* . Eq. ([8]l and (|9]) yield: 

-M^^+/(4J-$f$ra^^+$f^2/-0 (23) 

- Mpc - $f c $r. a^. + $rs y = 0. (24) 

According to ( [T2] l, and using ( |2T| l, a point a* is a critical point 
of \^(-) if and only if: 



$^y-$^$a* G \dC{a*). 



(25) 



For the nodes in the active set n e F*, C{an) admits a usual 
gradient, as defined in Q and thus, ( p5] l yields 



This is the same as condition ( [23) . For nodes in the inactive 
set F^, we need to determine the subgradient of C() at zero. 
To do so, note that using (|4| and the continuity of the function 



/(•) at A (condition (|6a]i), we have: 

\miXsJC{a) = limw - f{u) = A. 

As a consequence, we find that: 



limVC(a) = 1 



and 



WiaV C(a) = -1. 



From the definition of the subgradient, these limits show that: 
9C(0) — [—1,1]. Using this, the condition in ( |25| ) restricted 
to the set of inactive nodes is equivalent to: 

$rcy-$fc$r.a^. e [-A, A]. 

We immediately see that this condition is the same as ( |24] i, 
since by definition of the inactive nodes Upc G [—A, A]. This 
shows that the fixed points a* coincide with the critical points 
of the objective function ([T]i. 

Moving on to establishing the convergence result in part 2 
of the theorem, we first note that from our analysis above, the 
same reasoning leads to the conclusion that: 

-u{t) = u(t)-/(u(t)) + $^$ a(i)-$^y e dV{a{t)). (26) 



Using the chain rule ( |22| with C, ~ —it-it) G dV{a{t)) from 
( p6| ) and using the expression in (|7]i, we get the relationship: 

dV{a{t)) 



dt 



-u{t)^a(t) 
-u{t)^ F' {t)u{t) 

-Y.r{Un{t))\Ur.{t)\\ 



(27) 



This expression is valid for all time i > 0. In addition, because 
the function /(•) satisfies ( |6b] i, /'(u„(i)) > 0, and thus 

dV{a{t)) 



dt 
dV{a{t)) 

Jt 



< for all t > 0, and 



< for all < > such that |lMr(i)|l2 ¥" 0- 



This means that V{a{t)) is non-increasing for all t > 0. 
Since V{a{t)) is continuous, bounded below by zero, and non- 
increasing, V{a{t)) converges to a constant value V*, and its 
time derivative V{a{t)) tends to zero as t ^ oo. Using equa- 

^ I^r(i)|l2 = 0. 



tions ( 27 1 and nm, we conclude that: lim 

As a consequence, we also have lim ||a(t)|L = 0, so the 

i— f+00 

outputs converge to the set E ~ {a : s.t. d{t) = 0} , and the 
LCA outputs are quasi-convergent. 

Moving on to part 3 of the theorem, we assume that the 
critical points of ([TJ are isolated. We need to show that both 
active and inactive nodes converge to a single fixed point. 
From part 2, we know that since the nodes converge to E, 
after some time, the active nodes will be within a ball of 
radius R around one element a* E E. However, since critical 
points of ([TJ are isolated, there exists a ball B^ {a* ) of radius 
e > around a* that does not contain any critical point: 



eE 



and 



Va e Se(a*), a 7^ a* 



E. 



Since the system is stable, we know that once the trajectory 
gets close enough (within a ball of radius R) to one element in 
E, it cannot leave a ball of radius e around this fixed point (see 
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([T3|l). As a consequence, the outputs remain within the ball 
Be{a*), which contains only the fixed point a*. This proves 
that the active nodes converge to the point a* in E: 



lim a(t) 

t-f+oo 



(28) 



This implies that the ODE (|3]l can be written in terms of the 
distance a{t) = a{t) — a* of the outputs from the solution: 

u{t) = -u{t) - $"^$0* + $^y + a* - $^$S(i) + a{t). 

We let 

u* = -$^$a* + $^y + a*, 

and rewrite the ODE ([3]) as: 

u{t) = ~u{t) + u* - ($'^$ - /) a{t) 
Solving this ODE for all t > yields: 



u{t) 



KO)- 



($^$-/)5(s)ds. 



While it is difficult to say anything directly about the trajectory 
of the system, it is helpful to consider a surrogate trajectory 
that is a straight line in the state-space: u* + e^* (""(0) — u*). 
This linear path obviously converges to the fixed point u*, 
and if we are able to show that the actual trajectory u{t) 
asymptotically approaches this idealized linear path, we will 
have established that the system converges to u* . To take this 
approach, we examine the quantity 

h{t) = u{t) -u* ~ e"* (w(0) - u*) , 

which is the deviation from the linear path. Consider the norm 
of this deviation: 

\\h{t)\\, = \\u{t)-u*-e-'{u{0)-u*)\\ 



($^$-/)a(s)ds 



< e" 



'0 



$^$_/ 



|a(s)||2 ds. 



To show convergence to zero, we split the integral into two 
parts. Since a{t) — > 0, then for any e > 0, there exists a 

time tc > such that Vt > tc, ||a(t)||2 < e. Moreover, since 
||a(t)|J2 is continuous and goes to zero as t goes to infinity, it 
admits a maximum /i, Vt G M. This yields, for all t > 2tc. 

\\h{t)\\. 



<e^* $^$_/ 


2^ / e^ds 
Jo 


rt 




+ e-* $^$-/ ^e / e'ds 




Jtc 




+ $^$-/ ^e[l-e-'+'^' 




< $^$-/ 2/i 


e-*/2 - e-' 


+ $^$- 


-I 2^ 



Since the left term converges to and e can be chosen 
arbitrarily small, this shows that the trajectory u{t) converges 
to the trajectory u* +e~* (m(0) — u*) as t goes to infinity, and 
thus, we can conclude that u{t) — > u* . 



Because we have shown separately that both the active and 
inactive nodes converge for any initial state, it concludes our 
proof that the system is globally convergent. ■ 



Appendix B 
Proof of Lemma[T] 

Proof: Each of the four cases will be treated separately. 

(i) For any w„ G M, let z„ = sign(?I„). Since the thresh- 
olding function is non-decreasing ( |6b] i, and Tx(— ii„) = 
—Txiun) from ( |6a] l, we have: 

Z„ = sign(S„) => < ZnUn 

=> Tx {ZnU*^) < Tx (ZnUn + Z^U*^) 
=> ZnTx (u* ) < Z^Tx (u„ + W* ) 

^{)<z^[Tx{u^ + ul)~Tx{ul)] 

=> < z„a„ 

=> sign(a„) = z„ = sign(M„). 

(ii) We can separate this proof into four cases. 

If |u„+M*| < A and |w*| < A, we have: |a„| = 
\Tx{un + <) - Tx{u*^)\ = < a |«„| . If |m„ + <| < 
A and |u* | > A, according to the mean value theorem, 
since the function /(•) is continuous on [ sigii(u* )A, u* ], 
differentiable on ( sign(uJ^)A,Mjj) and /( sign(M*)A) = 
0, there exist c G ( sign(M* )A,uJ^) such that: 

|a«l = l/«)l = l/'(c)«- sign«)A)| 
= l/'(c)|(|<|-A) (since K| > A) 
<«(!<! -A) 
< a \un\ (since |m* | - |u„| < |w„ + K| < A). 

If |m„ + u*| > A and |u*| < A, accord- 
ing to the mean value theorem, there exists c G 
( sign(u„ + u* )A, Un + wJi) such that: 

\an\ = |/(u„ + 0| 

= l/'(c) {un + < - sign(u„ + OA)| 

= |/'(c)|(K + <|-A) 

< a{\un\ + Kl - A) < a\un\. 

Finally, if jun + u* | > A and |u* | > A, according to the 
mean value theorem, there exists c G (u* , «„ + w* ) such 
that |a„| = |/(u„ + O - /«)| = |/'(c)u„| < a |u„|. 
(iii) Using property (i) and (ii), we have: 



X! i""i i""i 



< > a \u„ I la„ I = a YJ 

ner ner 



^ ^ a|"n| |a„| : 
ner 

< ^ a|u„| a|u„ 



, 2~T; 



(iv) Since the soft-threshold function is non-decreasing, then 
the function .g„(s) —Tx{,s + u* ) — Txiu*^) is also non- 
decreasing. Moreover 5„(0) = and (7„(u„(t)) = an{t). 
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As a consequence, we can bound the integral by: 

gn{s)ds < {gn{Un{t)) ~ gn{0)]Un{t) = a„M„ 



As a consequence: 

Y^ / gn{s)ds < Y^ 



.JO 



n=\ 



dnUn — a U. 



n=l 



References 

[1] A. Balavoine, C. J Rozell, and J. Romberg, "Global convergence of the 
locally competitive algorithm," in 2011 IEEE Digital Signal Processing 
Workshop and IEEE Signal Processing Education Workshop (DSP/SPE). 
Jan. 2011, pp. 431^36, IEEE. 
[2] M. Elad, M. A.T. Figueiredo, and Y. Ma, "On the role of sparse and 
redundant representations in image processing," Proc. IEEE, vol. 98, 
no. 6, pp. 972-982, June 2010. 
[3] B. A. Olshausen and D. J. Field, "Sparse coding of sensory inputs," 
Current Opinion in Neurobiology, vol. 14, no. 4, pp. 481^87, Aug. 
2004, PMID: 15321069. 
[4] B. A. Olshausen and D. J. Field, "Emergence of simple-cell receptive 
field properties by learning a sparse code for natural images," Nature, 
vol. 381, no. 6583, pp. 607-609, June 1996. 
[5] B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete 
basis set: A strategy employed by VI?," Vision Research, vol. 37, no. 
23, pp. 3311-3325, Dec. 1997, PMID: 9425546. 
[6] E. J. Candes, J. Romberg, and T. Tao, "Robust uncertainty principles: 
exact signal reconstruction from highly incomplete frequency informa- 
tion," IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489- 509, Feb. 2006. 
[7] D. L. Donoho, "Compressed sensing," IEEE Trans. Inf. Theory, vol. 

52, no. 4, pp. 1289-1306, Apr. 2006. 
[8] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, "An Interior- 
Point method for Large-Scale LI -Regularized least squares," IEEE J. 
Sel. Topics Signal Process., vol. 1, no. 4, pp. 606-617, Dec. 2007. 
[9] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, "Gradient 
projection for sparse reconstruction: Application to compressed sensing 
and other inverse problems," IEEE J. Sel. Topics Signal Process., vol. 
1, no. 4, pp. 586-598, Dec. 2007. 

[10] I. Daubechies, M. Defrise, and C. De Mol, "An iterative threshold- 
ing algorithm for linear inverse problems with a sparsity constraint," 
Communications on Pure and Applied Mathematics, vol. 57, no. 11, pp. 
1413-1457, Aug. 2004. 

[11] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and 
Signal Processing, Wiley, 1993. 

[12] J. J. Hopfield, "Neural networks and physical systems with emergent 
collective computational abilities," Proceedings of the National Academy 
of Sciences, vol. 79, no. 8, pp. 2554 -2558, Apr. 1982. 

[13] C. Twigg and P. Hasler, "Configurable analog signal processing," Digital 
Signal Processing, vol. 19, no. 6, pp. 904-922, Dec. 2009. 

[14] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen, 
"Sparse coding via thresholding and local competition in neural circuits," 
Neural Computation, vol. 20, no. 10, pp. 2526-2563, Oct. 2008. 

[15] J. P Hespanha, "Uniform stability of switched linear systems: Extensions 
of LaSalle's invariance principle," IEEE Trans. Autom. Control, vol. 49, 
no. 4, pp. 470- 482, Apr. 2004. 

[16] S. Liu and J. Wang, "A simplified dual neural network for quadratic 
programming with its KWTA application," IEEE Trans. Neural Netw., 
vol. 17, no. 6, pp. 1500-1510, Nov 2006. 

[17] H. Yang and T. S. Dillon, "Exponential stability and oscillation of 
hopfield graded response neural network," IEEE Trans. Neural Netw., 
vol. 5, no. 5, pp. 719-729, Sept. 1994, PMID: 18267846. 

[18] X.-B. Liang and J. Wang, "A recurrent neural network for nonlinear 
optimization with a continuously differentiable objective function and 
bound constraints," IEEE Trans. Neural Netw., vol. 11, no. 6, pp. 1251- 
1262, Nov. 2000. 

[19] M. Forti and A. Tesi, "Absolute stability of analytic neural networks: an 
approach based on finite trajectory length," IEEE Trans. Circuits Syst. 
1, Reg. Papers, vol. 51, no. 12, pp. 2460- 2469, Dec. 2004. 

[20] W Lu and J. Wang, "Convergence analysis of a class of nonsmooth 
gradient systems," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, 
no. 11, pp. 3514-3527, Dec. 2008. 



[21] Y Xia, G. Feng, and J. Wang, "A recurrent neural network with 
exponential convergence for solving convex quadratic program and 
related linear piecewise equations," Neural Networks, vol. 17, no. 7, 
pp. 1003-1015, Sept. 2004, PMID: 15312842. 

[22] L. V. Ferreira, E. Kaszkurewicz, and A. Bhaya, "Support vector 
classifiers via gradient systems with discontinuous righthand sides," 
Neural Networks, vol. 19, no. 10, pp. 1612-1623, Dec. 2006, ACM 
ID: 1223046. 

[23] M. Forti and A. Tesi, "New conditions for global stability of neural net- 
works with application to linear and quadratic programming problems," 
IEEE Trans. Circuits Sy.<it. 1, Fundam. Theory AppL, vol. 42, no. 7, pp. 
354-366, July 1995. 

[24] L. Li and L. Huang, "Dynamical behaviors of a class of recurrent neural 
networks with discontinuous neuron activations," Applied Mathematical 
Modelling, vol. 33, no. 12, pp. 4326^336, Dec. 2009. 

[25] M. Forti, P. Nistri, and M. Quincampoix, "Generalized neural network 
for nonsmooth nonlinear programming problems," IEEE Trans. Circuits 
Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1741- 1754, Sept. 2004. 

[26] M. Forti, P. Nistri, and M. Quincampoix, "Convergence of neural 
networks for programming problems via a nonsmooth Lojasiewicz 
inequality," IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1471-1486, 
Nov 2006. 

[27] X. Xue and W. Bian, "Subgradient-Based neural networks for nons- 
mooth convex optimization problems," IEEE Trans. Circuits Sy.st. I, 
Reg. Papers, vol. 55, no. 8, pp. 2378-2391, Sept. 2008. 

[28] Q. S. Liu and J. Wang, "Finite-Time convergent recurrent neural network 
with a Hard-Limiting activation function for constrained optimization 
with Piecewise-Linear objective functions," IEEE Trans. Neural Netw., 
vol. 22, no. 4, pp. 601-613, Apr 2011. 

[29] H. Lin and P. J. Antsaklis, "Stability and stabilizability of switched linear 
systems: A survey of recent results," IEEE Trans. Autom. Control, vol. 
54, no. 2, pp. 308-322, Feb. 2009. 

[30] W. Maass, "On the computational power of Winner-Take-All," Neural 
Computation, vol. 12, no. 11, pp. 2519-2535, 2000. 

[31] B. K. Natarajan, "Sparse approximate solutions to linear systems," SIAM 
Journal on Computing, vol. 24, no. 2, pp. 227-234, 1995. 

[32] S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition 
by basis pursuit," SIAM Review, vol. 43, no. 1, pp. 129-159, Mar. 2001. 

[33] D. L. Donoho and J. Tanner, "Sparse nonnegative solutions of underde- 
termined linear equations by linear programming," Proceedings of the 
National Academy of Sciences of the United States of America, vol. 102, 
no. 27, pp. 9446-9451, 2005. 

[34] D. C. Knill and R. Pouget, "The Bayesian brain: The role of uncertainty 
in neural coding and computation," Trends in Neuroscience, vol. 27, pp. 
712-719, 2004. 

[35] M. Rehn and F T Sommer, "A network that uses few active neurones to 
code visual input predicts the diverse shapes of cortical receptive fields," 
Journal of Computational Neuroscience, vol. 22, no. 2, pp. 135-146, 
Apr. 2007, PMID: 17053994. 

[36] E. C. Smith and M. S. Lewicki, "Efficient auditory coding," Nature, 
vol. 439, no. 7079, pp. 978-982, Feb. 2006. 

[37] S. Fischer, R. Redondo, L. Perrinet, and G. Cristbal, "Sparse approxima- 
tion of images inspired from the functional architecture of the primary 
visual areas," EURASIP Journal on Advances in Signal Processing, vol. 
2007, pp. 1-17, 2007. 

[38] L. Perrinet, "Feature detection using spikes: The greedy approach," 
Journal of Physiology (Paris), vol. 98, no. 4-6, pp. 530-539, July 2004. 

[39] J. M Bioucas-Dias and M. A.T. Figueiredo, "A new TwIST: Two- 
Step iterative Shrinkage/Thresholding algorithms for image restoration," 
IEEE Trans. Image Process., vol. 16, no. 12, pp. 2992-3004, Dec. 2007. 

[40] F. H. Clarke, Optimization and Nonsmooth Analysis, Society for 
Industrial Mathematics, Jan. 1987. 

[41] R. A. DeCarlo, M. S. Branicky, S. Pettersson, and B. Lennartson, 
"Perspectives and results on the stability and stabilizability of hybrid 
systems," Proc. IEEE, vol. 88, no. 7, pp. 1069-1082, July 2000. 

[42] A. Bacciotti and L. Rosier, Liapunov functions and stability in control 
theory. Springer, 2005. 

[43] J. -J. Fuchs, "On sparse representations in arbitrary redundant bases," 
IEEE Trans. Inf. Theory, vol. 50, no. 6, pp. 1341-1344, June 2004. 

[44] J. Bolte, A. Daniilidis, and A. Lewis, "The Lojasiewicz inequality 
for nonsmooth subanalytic functions with applications to subgradient 
dynamical systems," SIAM Journal on Optimization, vol. 17, no. 4, pp. 
1205-1223, 2007. 

[45] E. J. Candes, J. K. Romberg, and T. Tao, "Stable signal recovery from 
incomplete and inaccurate measurements," Communications on Pure 
and Applied Mathematics, vol. 59, no. 8, pp. 1207-1223, Aug. 2006. 



